专利摘要:
Method and synchronization device and parallel execution of trace instructions on a segmented RISC processor. The invention consists of a device whose internal structure, based on a segmented processor, eliminates the overload of execution time introduced by the code instrumentation used to measure the execution time in the worst case. For this, the device uses a specific instruction code for the instrumentation, which is interpreted as the trace rating of the instruction that precedes it, and which allows to identify univocally the moment of execution of said instruction. The proposed device executes in parallel, and in a synchronized manner, each trace instruction with the instruction to draw that precedes it, and conditions said execution to be completed, without being affected by bubbles, the execution of the instruction to be drawn. (Machine-translation by Google Translate, not legally binding)
公开号:ES2697548A1
申请号:ES201830266
申请日:2018-03-20
公开日:2019-01-24
发明作者:Silva Antonio Da;Polo Oscar Rodriguez;Hellin Agustin Martinez;Espada Pablo Parra;Prieto Sebastian Sanchez
申请人:Universidad Politecnica de Madrid;Universidad de Alcala de Henares UAH;
IPC主号:
专利说明:

[0001]
[0002]
[0003]
[0004] SECTOR OF THE TECHNIQUE
[0005]
[0006] The invention is framed, in general, in the Electronics, Computing and Telecommunications (ICT) sector, although it has specific application in critical systems of the Aerospace, Defense and high reliability sectors.
[0007]
[0008] BACKGROUND OF THE INVENTION
[0009]
[0010] Different inventions have been identified that propose solutions to facilitate the drawing of instructions but which differ from the present invention.
[0011]
[0012] Patent application US 5996092 A, "System and method for tracing program execution within a processor before and after a triggering event", allows to initiate and interrupt the trace of instructions, using a trace processor that works in parallel to the processor executing the instructions. own instructions The trace processor, after detecting the start time of the trace, by means of a specific instruction, stores in a shared memory information relative to the entire instruction execution sequence up to the moment in which it detects the instruction to stop In the present device, there is no shared memory nor a trace processor in parallel, and the trace is based on an instrumentation of the code that adds, at specific points of the code in which it is desired to obtain a punctual trace, instructions from trace that univocally identify the point we want to draw, without resorting to the program counter.The trace instruction is executed in parallel with the previous instruction to be plotted, introducing redundancy of the necessary parts of the processor pipeline , and the result of the execution is the writing in an output register where an analysis hardware captures it, so that it is registered the specific instant in which the instruction was executed. The present invention presents a Different approach, where possible, by means of the selective instrumentation of instructions located in any part of the code, obtain the worst execution time of each one of the system functions. This type of instrumentation is being used by commercial tools such as RapiTime in critical avionics systems (G. Bernat et al., "Identifying Opportunities for Worst-case Execution Time Reduction in an Avionics System," Ada User Journal, Volume 28, Number 3 , 2007, pp.
[0013] 189-194). However, its application using the processors available in the market has as its main disadvantage the overload introduced in the execution time, which is eliminated with the presented invention.
[0014]
[0015] With respect to US patent application 2017147472 (A1), "Systems and methods for a real time embedded trace", the main difference is that the system maps the jump instructions autonomously In the invention presented herein, the instructions that are drawn are defined by selective instrumentation techniques, which insert, after each instruction that we wish to draw, a trace instruction, what makes this invention is to execute them in parallel in a synchronized manner with the instruction that is intended to be drawn. As already described, this instrumentation technique is being used in critical systems and allows drawing blocks of code to evaluate its execution time in the worst case, taking into account that the transition between some of these blocks, such as the one corresponding to a block else, and the subsequent block, do not imply a jump in the execution, so they would not be detected by the solution presented by the US patent 2017147472 (A1).
[0016]
[0017] As for the patent US 6513134 (B1), "System and method for tracing program execution within a superscalar processor", presents an improvement with respect to the US 5996092 A, allowing to work with superscalar processors that work at high frequencies, above the 400MHz For this purpose, it uses a coding of the information to be plotted that allows to reduce the space that it is necessary to use to store it in the trace buffer that is provided as output.As in this patent, blocks of instructions are drawn to analyze its execution, but defining a more flexible way to trigger the trace, and using a trace coding that allows savings in terms of the stored information and the number of pins used. This patent, therefore, does not avoid the overload of the use of code instrumentation techniques that affect the entire system, as occurs with the claimed invention, but rather aims to optimize a tracing mechanism, not based on the instrumentation, but on the detection of events that conform to predefined conditions.
[0018]
[0019] Therefore, it is concluded that the existing systems for the trace of instructions allow to program specific events of trace triggering, to collect trace information limited to a certain interval before and / or after said event. These methods suffer from a certain rigidity in that the number of blocks that can be traced in each execution is always limited, and they do not adapt well to the techniques of code instrumentation that are used in the characterization of the execution time in the worst in critical systems, such as the one used by the aforementioned RapiTime tool. Since the application of code instrumentation using current processors introduces overloading in the execution time, the presented invention, designed to eliminate these overloads, provides an improvement with a specific objective framed in this field.
[0020]
[0021] DESCRIPTION OF THE INVENTION
[0022]
[0023] In a first aspect of the invention, a parallel processing device of program instructions and trace instructions is disclosed. The parallel processing device of program instructions and trace instructions comprises:
[0024] • an instruction search stage, which in turn comprises:
[0025] or a module for calculating the direction of the instruction; Y,
[0026] or module of search of the instructions with double reading port; • a duplicate decoding stage;
[0027] • a pipeline-trace ( trace pipeline) for the processing of trace instructions only;
[0028] • an exit record for the trace;
[0029] • a data path which in turn comprises a set of multiplexers; • a controller of the data path, which in turn comprises inputs and outputs that control the multiplexers, the load in some registers associated with the different stages and the output register for the trace;
[0030] where the data path controller is configured to determine, depending on the state of said controller and the value of the inputs in said controller, the value of the outputs that are sent to the multiplexers of the data path in such a way that a trace instruction is executed in a synchronized manner with the preceding instruction, said execution becoming effective during the last stage of the trace pipeline .
[0031]
[0032] In one embodiment of the invention, the controller comprises the following sequences of instructions S1, S2 and S3:
[0033] S1: corresponds to Instruction-Trace pairs in which the instructions to be plotted are always loaded in the "INSTRUCTION N" element, while the corresponding trace instructions are loaded in the "INSTRUCTION N + 1" element;
[0034] S2: corresponds to a sequence of instructions that are not drawn, so in the elements "INSTRUCTION N" and "INSTRUCTION N + 1" instructions are always loaded (and not traces);
[0035] S3: corresponds to two trace-command pairs in which the trace instructions are loaded in successive cycles in the "INSTRUCTION N 1" element, while the instructions to be plotted are loaded in those same cycles in the "INSTRUCTION N + 1" element .
[0036]
[0037] The processing device executes the sequence S1 during two clock cycles, T and T + 1; in such a way that the instructions stored in directions X + 1, X + 3 and X + 5, are the instructions of trace of the instructions that precede them, located, respectively, in directions X, X + 2 and X + 4 .
[0038]
[0039] The processing device executes the sequence S2 in which trace instructions are not loaded for two cycles; in such a way that during the first cycle T it is detected that the two instructions that are loaded in the decoding stage are not trace, and therefore the signals "N_ES_TRAZA""N_1_ES_TRAZA" are both "0"; and, in the cycle T 1 the controller is in the state called "INSTR PENDING", wherein the pending instruction located in the second decoding unit is directed to "stage 3" of the processor instruction pipeline .
[0040]
[0041] The processing device executes the sequence S3: in the cycle T the value of the signal "N_ES_TRAZA" is "1", while "N_1_ES_TRAZA" is "0", and the value of the multiplexing signal "SEL_TR_P4" is "2"", In such a way that a route is enabled where the trace instruction is synchronized with the execution of the instruction to be plotted; in cycle T + 1, the trace instruction is located in stage 4 of the trace pipeline . In cycle T, in addition, the multiplexing signal "SEL_PIPE_TRAZA" takes the value 0, so that in cycle T + 1 a zero is found in stage 3 of the trace pipeline .
[0042]
[0043] In an embodiment of the invention, the processing device during a cycle "T", detects a bubble in stage 3 of the instruction pipeline , in such a way that the controller sets the route that loads a "0" in the stages 3 and 4 of the pipeline- trace and a jump direction "Z" is routed to the input register of the search stage, so that the detection of the bubble in stage 3 corresponds to setting "1" of the signal "BURBUJA_P3" and the signal "BURBUJA". The controller controls: the route to stage 3 by assigning "0" to the signal "SEL_PIPE_TRAZA", the route to stage 4 by assigning a "0" to the signal "SEL_TR_P4"; and loading the input register of the search stage by activating the signal "LD_DIR" and routing the address "Z" to said register by assigning a "0" to the signal "SEL_DIR".
[0044]
[0045] In one embodiment of the invention, the device the processing device during a cycle "T", detects a bubble in stage 4 of the instruction pipeline , in such a way the controller sets the path that loads a "0" in the stages 3, 4 and 5 of the pipeline-trace and the jump direction "Z" is routed to the entry register of the search stage, so that the detection of the bubble in stage 4 corresponds to the start "1" of the signal "BURBUJA_P4" and of the signal "BURBUJA". The controller controls: the route to stage 3 of the pipeline- trace by assigning "0" to the signal "SEL_PIPE_TRAZA", the route to stage 4 assigning a "0" to the signal "SEL_TR_P4"; the route to stage 5 is controlled by assigning a "0" to the "SEL_TR_P5" signal, and, loading the input register of the search stage by activating the "LD_DIR" signal and routing the "Z" address made this register by assigning a 0 to the "SEL_DIR" signal.
[0046]
[0047] In a second aspect of the invention, a RISC processor, "Computer with Reduced Instruction Set", is disclosed, which comprises a parallel processing device of program instructions and trace instructions according to any one of the previous embodiments for the first aspect of the invention.
[0048]
[0049] In a third aspect of the invention there is disclosed a method of parallel processing of program instructions and trace instructions which, executed on a parallel processing device of program instructions and trace instructions defined in any of the embodiments of the first aspect of the invention, it processes an instruction and a trace instruction in parallel.
[0050]
[0051] BRIEF DESCRIPTION OF THE FIGURES
[0052]
[0053] To complement the description of the invention and in order to help a better understanding of its characteristics, an assembly of drawings is included as an integral part of said description, in which the following has been represented with an illustrative and non-limiting character:
[0054]
[0055] Figure 1. Structure of the device for synchronization and parallel execution proposed in the invention.
[0056] Figure 2. Mealy machine of the data path controller.
[0057] Figure 3. Evolution of the Data Path before the sequence of instructions S1.
[0058] Figure 4. Evolution of the Data Path before the sequence of instructions S2.
[0059] Figure 5. Evolution of the Data Path before the sequence of instructions S3.
[0060] Figure 6. Evolution of the Data Path before a bubble in stage 3 of the pipeline.
[0061] Figure 7. Evolution of the data path before a bubble in stage 4 of the pipeline.
[0062] Figure 8. Evolution of the Data Path after a bubble in the previous cycle.
[0063] Figure 9 Transition table of states of the data path controller.
[0064] Figure 10 Table of the outputs of the controller of the data path relative to the stage of search.
[0065] Figure 11 Table of the outputs of the controller of the data path relative to the decoding stage.
[0066] Figure 12 Table of the outputs of the controller of the data path relative to steps 3, 4 and 5 of the pipeline of the trace instructions.
[0067]
[0068] In Figure 1 the elements of the synchronization and parallel execution device proposed in the invention are referenced. These elements are the following:
[0069] 100 Stage 1 instruction search
[0070] 101 Address selection module of the following instruction 102 Instruction search module with double reading port 103 Duplicate decoding stage 2
[0071] 104 Device data path controller
[0072] 105 Entries to the data path controller
[0073] 106 Outputs of the data path controller
[0074] 107 "INSTRUCTION N" decoding module of stage 2 108 "INSTRUCTION N 1" decoding module of stage 2 109 Input stage selection multiplexer of stage 3 of the RISC processor instruction pipeline
[0075] 110 Multiplexer for the selection of the entrance of stage 3 of the pipeline of the trace instructions
[0076] 112 Pipeline of the RISC processor instructions
[0077] 113 Pipeline of the trace instructions
[0078] 114 Stage 3 of the pipeline of the instructions of the RISC 115 processor Stage 3 of the pipeline of the trace instructions
[0079] 116 Input selection multiplexer of stage 4 of the trace instruction pipeline
[0080] 117 Stage 4 of the pipeline of the RISC 118 processor instructions Step 4 of the pipeline of the trace instructions
[0081] 119 Input selection multiplexer for stage 5 of the trace instruction pipeline
[0082] 120 Stage 5 of the pipeline of the RISC processor instructions Stage 5 of the pipeline of the trace instructions
[0083] Input to the data path controller that monitors the detection of bubbles
[0084] Trace information output record
[0085] WAIT: Input to the data path controller that monitors the wait in the instruction search
[0086] BUBBLE_P3: Input signal to the controller that monitors the detection of a bubble in stage 3 of the RISC processor instruction pipeline
[0087] BUBBLE_P4: Input signal to the controller that monitors the detection of a bubble in stage 4 of the pipeline of instructions of the RISC processor
[0088] LD_N: Output of the controller of the data path that controls the loading of the decoding module "INSTRUCTION N" of stage 2
[0089] LD_N_1: Output of the controller of the data path that controls the loading of the decoding module "INSTRUCTION N 1" of stage 2
[0090] N_ES_TRAZA: Input signal to the controller that monitors whether the instruction that has been decoded in the "Instruction N" element is a trace type
[0091] N_1_ES_TRAZA: Input signal to the controller that monitors whether the instruction that has been decoded in the "Instruction N + 1" element is a trace type
[0092] SEL_PIPE_INSTR: Output of the data path controller that controls the input multiplexer to stage 3 of the instruction pipeline of the RISC processor
[0093] SEL_PIPE_TRAZA: Output of the controller of the data path that controls the input multiplexer to stage 3 of the trace instruction pipeline
[0094] SEL_TR_P4: Output of the controller of the data path that controls the input multiplexer to stage 4 of the pipeline of trace instructions
[0095] 134 SEL_TR_P5: Output of the data path controller that controls the input multiplexer to step 5 of the trace instruction pipeline
[0096] 135 TR_P5_ES_CERO: Signal that monitors whether step 5 of the trace instruction pipeline stores a zero value
[0097] 136 LD_TR_OUT: Signal that controls the storage in the output log of the trace information. It takes the complementary value to the TR_P5_ES_CERO signal, so the record is only loaded when the trace information is nonzero.
[0098] 137 LD_DIR: Signal that controls the storage in the entry register to the search stage of the address of the next instruction to search.
[0099] 138 SEL_DIR: Output of the controller of the data path that controls the input multiplexer to the input register to the search stage that stores the address of the next instruction to search.
[0100] 139 Entry record to the search stage that stores the address of the next instruction to search.
[0101] 140 Input multiplexer to the entry register to the search stage that stores the address of the next instruction to search.
[0102]
[0103] DESCRIPTION OF A FORM OF EMBODIMENT OF THE INVENTION
[0104]
[0105] The invention consists of a device equipped with an internal processing structure that eliminates the overload of execution time introduced by the instrumentation of code used to measure the time of "execution in worst case" using hybrid analysis.This analysis combines static analysis of the code with measures of the execution time on the deployment platform Static analysis determines which instructions it is necessary to trace and, by means of instrumentation techniques, adds code of tracing after each instruction to be drawn, of This means that the instant of execution of said instruction can be captured by means of a support hardware and a logic analyzer. The code added after the instruction that we want to trace allows to identify univocally the moment of execution of said instruction, but introduces an overload that with this invention can be eliminated. The device is capable of detecting the trace instructions and executing them in parallel, in a synchronized manner, and conditioned to the complete execution of the instruction that precedes it. In this way, it allows the drawing process to be non-intrusive with regard to the execution time, since the sequence and execution time of the program under analysis are not modified by the introduction of the traces, since they are executed in parallel.
[0106]
[0107] The device proposed in the invention uses a specific instruction code, which will be used for the instrumentation, and whose internal structure interprets it as the trace rating of the instruction that precedes it. The main elements of this device, which are shown in Figure 1, are: 1) an instruction search step (100), which has a calculation module for the instruction address (101) and search module for the instructions with double reading port (102); 2) a duplicate decoding step (103); 3) a specific pipeline for the trace instructions (113); 4) an exit record for the trace (123); 5) the data path, formed by a set of multiplexers (109, 110, 116 and 119); 6) the controller of the data path of the device (104), which determines, based on its state, and the value of its inputs (105), the value of the outputs (106) that control both the multiplexers (109, 110, 116 and 119), as the load in registers associated with the different stages (107, 108, 114, 115, 117, 118, 120 and 121), as well as the output register (123). Both the inputs (105) and the outputs (106) are represented graphically in FIG. 1 next to the label assigned to each signal.
[0108]
[0109] The device, thanks to the search stage with double port (100), allows two instructions to be loaded simultaneously to the decoding stage (103) so that they are decoded in parallel. The signal "WAIT" (124) of this stage is used to model possible waiting states in said search, and can be activated after a reset of the processor, or as a consequence of making a jump effective, which causes the injection of bubbles in the pipeline of processor instructions (112), monitored by the signals "BUBBLE_P3" (bubble injection in stage 3, 125), "BUBBLE_P4" (bubble injection in stage 4, 126), and the OR function of these, labeled as "BUBBLE" (122). The signal "WAIT" (124) will be deactivated to notify the decoding stage (103) that the instructions are available to be loaded.
[0110]
[0111] The two instructions that are in the decoding stage always correspond to instructions stored in consecutive words of memory. In Figure 1, the item labeled "INSTRUCTION N" (107) will be the one that will receive the first of the two instructions, while the labeling as "INSTRUCTION N + 1" (108) will receive the next one. The N and N + 1 values do not correspond to physical memory addresses that are consecutive, but represent two instructions stored in consecutive memory words, regardless of the word size in bytes of the processor, which in the most general case of a processor 32 bit RISC would be 4.
[0112]
[0113] In the decoding stage (103), and as a consequence of decoding each of the instructions, it is determined whether the instructions are of the trace type or belong to the rest of the instruction set, calculating the signals labeled in figure 1 as " N_ES_TRAZA "(127) and" N_1_ES_TRAZA "(128).
[0114]
[0115] The controller of the route (104) uses the values of those signals, and that of the signals "WAIT" (124) and "BUBBLE" (122), together with the state of the controller itself, to determine the route that the instructions will follow to the following stages. The controller configures the multiplexers (109, 110, 116 and 119) of the route to ensure that a trace instruction is executed in a synchronized manner with the preceding instruction, said execution becoming effective during the last stage (121) of the pipeline . trace (113), in which it is verified that the signal "TR_P5_ES_CERO" (135) is deactivated, in which case the signal "LD_TR_OUT" is activated (136) and the trace value is directed to the output register (123) .
[0116]
[0117] Figure 2 represents the Mealy machine of the driver of the data path of this device (200), which is formally specified in the tables of figures 9, 10, 11 and 12.
[0118] Figures 3, 4 and 5 are, respectively, examples of how the controller, to make the synchronization effective, fixes the route in the following sequences of possible instructions S1, S2, and S3:
[0119] • The sequence S1 corresponds to instruction-trace pairs (301- 302, 303-304 and 305-306) in which the instructions to be drawn (301, 303 and 305) are always loaded in the "INSTRUCTION N" element ( 107), while the corresponding trace instructions (302, 304 and 306) are loaded into the "INSTRUCTION N + 1" (115) element.
[0120] • S2 corresponds to a sequence of instructions (401, 402, 403 and 404) that are not drawn, so in the elements "INSTRUCTION N" (114) and "INSTRUCTION N + 1" (115) instructions are always loaded and not traces
[0121] • The sequence S3 corresponds to two trace-instruction pairs (502-503 and 504 505) in which the trace instructions (503 and 505) are loaded in successive cycles (500 and 510) in the "INSTRUCTION N 1" element ( 107), while the instructions to be plotted (502 and 504) are loaded in those same cycles in the "INSTRUCTION N + 1" (108) element.
[0122]
[0123] Figure 3 shows the operation of the device during two clock cycles, T (300) and T + 1 (307), in which the processor executes the instruction sequence S1. In the sequence S1 the instructions stored in the addresses X + 1 (302), X + 3 (304) and X + 5 (306), are the trace instructions of the preceding instructions, located respectively in the directions X (301), X + 2 (303) and X + 4 (305). The scheme shows in the two cycles, T (300) and T + 1 (307), how the trace instructions (302, 304, 306), added as a result of the instrumentation, are directed towards the stages (115 and 118) which belong to the pipeline of the trace type instructions, while the instructions to be drawn (301, 303, and 305), are directed to the steps (114 and 117) that belong to the pipeline of the processor instructions. In this way, a synchronized execution of the mapped instruction and its trace instruction occurs, and the overload at runtime of inserting trace instructions in a program is avoided, since they run in parallel.
[0124] Figure 4 shows the operation of the device in the sequence S2, in which during two cycles trace instructions are not loaded. In that case during the first cycle T (400) it is detected that the two instructions that are loaded in the decoding stage (403 and 404) are not trace, and therefore the signals "N_ES_TRAZA" (129) and "N_1_ES_TRAZA" (130) are both 0. This situation leads to the decoding stage (103) not loading two new instructions at the beginning of cycle T 1 (408), since in cycle T (400) the values of "LD_N" ( 127) and "LD_N_1" (128) that control said load are both 0. In cycle T 1 (408) the controller is in the state called "PENDING INSTR" (202), in which the pending instruction (404) located in the second decoding unit (108) is directed to stage 3 of the processor instruction pipeline (114). In the two cycles, T (400) and T 1 (408), the controller loads step 3 of the pipeline of the trace instructions (115) with values 0, so the instructions will not be plotted.
[0125]
[0126] Figure 5 shows the operation of the device for the sequence S3, which covers the case in which in the cycle T (500) the instruction to be plotted (502) is in stage 3 of the instruction pipeline (114), while the trace instruction (503) is loaded into the "INSTRUCTION N" element (107) of the decoding stage (103). The sequence includes, furthermore, that in that same cycle the element "INSTRUCTION N 1" (108) of the decoding stage (103) contains the following instruction (504) to be executed. According to this sequence, in the cycle T (500) the value of the signal "N_ES_TRAZA" (129) is 1, while "N_1_ES_TRAZA" (130) is 0, and the value of the multiplexing signal "SEL_TR_P4" ( 133) is 2, which enables a path where the trace instruction (503) is synchronized with the execution of the instruction to be plotted (502). The synchronization becomes effective in cycle T + 1 (510), the trace instruction (503) being located in step 4 of the trace pipeline (118). In the cycle T (500), in addition, the multiplexing signal "SEL_PIPE_TRAZA" (132) takes the value 0, in order that in the cycle T + 1 (510) a zero (507) is found in stage 3 of the trace pipeline (115).
[0127] The sequence S3 causes, in addition, that during the cycle T (500), the following instruction (504), which is stored in the element "INSTRUCTION N 1" (108), has enabled the route to stage 3 of the instruction pipeline (114). To enable this route, the signal "SEL_PIPE_INSTR" (131) takes the value 1 during cycle T (500).
[0128] In the cycle T + 1 (510), the device repeats the same configuration of the data path that was in the cycle T (500), since the sequence places again a trace instruction (505) in the element " INSTRUCTION N "(107) of the decoding stage (103) and the next instruction to be executed (506) in the" INSTRUCTION N 1 "(108) element.
[0129]
[0130] Finally, Figures 6, 7 and 8 describe the operation of the device before the detection of bubbles. The bubbles are inserted in the instruction pipeline of a RISC processor in all situations in which the sequential execution of instructions is interrupted, as is the case with the jump instructions, both conditional and unconditional, or in the calls and returns of functions. When an instruction causes the sequential order of execution to be interrupted, the processor must discard the execution of the instructions after said instruction, and start the search for the instruction whose address has been determined after the execution of the instruction that caused the rupture. of sequence. In figures 6 and 7 this address is labeled as "Z" address (600). Figure 6 explains the data path before a bubble in stage 3 of the instruction pipeline (114), while figure 7 corresponds to a bubble detected in stage 4 (117), and which also causes a bubble in the stage 3 (114). Figure 8 explains the evolution of the route in the cycles following the detection of a bubble until the instruction of the jump direction (600) is supplied by the search stage (100).
[0131]
[0132] In Figure 6, it is shown how during cycle T a bubble is detected only in stage 3 of the instruction pipeline (114), and how the controller sets the route that loads a 0 in stages 3 (115) and 4 ( 118) of the trace pipeline (113), while the jump address "Z" (600) is routed to the entry register of the search stage (139). The detection of the bubble in stage 3 corresponds to the setting 1 of the signal "BUBBLE_P3" (125) and consequently of the signal "BUBBLE" (122). The route to stage 3 (115) is controlled by assigning 0 to the signal "SEL_PIPE_TRAZA" (132), and the route to stage 4 (118) is controlled by assigning a 0 to the signal "SEL_TR_P4" (133). The loading of the entry register of the search stage (139) is controlled by activating the signal "LD_DIR" (137) and routing the address "Z" (600) to said register (139) assigning a 0 to the signal "SEL_DIR" (138).
[0133]
[0134] In figure 7, it is shown how during cycle T a bubble is detected in stage 4 of the instruction pipeline (117), and how the controller sets the route that loads a 0 in steps 3 (115), 4 (118). ) and 5 (121) of the trace pipeline (113), while the jump address "Z" (600) is routed to the entry register of the search stage (139). The detection of the bubble in stage 4 corresponds to the setting 1 of the signal "BUBBLE_P4" (126) and consequently of the signal "BUBBLE" (122). The route to stage 3 of the trace pipeline (115) is controlled by assigning 0 to the signal "SEL_PIPE_TRAZA" (132), the route to stage 4 (118) is controlled by assigning a 0 to the signal "SEL_TR_P4" (133) , and the route to step 5 (121) is controlled by assigning a 0 to the signal "SEL_TR_P5" (134). The loading of the input register of the search stage (139) is controlled by activating the signal "LD_DIR" (137) and by routing the address "Z" (600) to said register (139) by assigning a 0 to the signal "SEL_DIR" (138).
[0135]
[0136] Figure 8 represents the wait of two cycles (801 and 802) after either of the two bubbles described in figures 6 and 7, so that in the cycle T + 2 (802) the search stage (100) deactivates the signal "WAIT" (124) indicating that the instructions are available to be loaded in the decoding stage (103) in the next cycle, with the signals "LD_N" (127) and "LD_N_1" (128) being activated.
[0137]
[0138] In the set of cases presented in figures 3, 4, 5, 6, 7 and 8 it is described how the device proposed in the invention behaves before the different sequences of instructions, and the appearance of possible bubbles. In all cases it is verified that the device performs the synchronization of the execution of the trace instructions with the drawn instructions, avoiding the overload at run time, since the trace instructions are always executed in parallel with the instructions to draw.
[0139]
[0140] The embodiment of the invention will be based on the structural specification of Figure 1 and operation according to the state transition diagram of Figure 2, and tables defined in figures 9, 10, 11 and 12. Figures 3, 4, 5 and 6 complete the details that facilitate the implementation.
[0141]
[0142] The preferred physical realization will consist of the "Hardware / Firmware" implementation of the described functionality, starting from a description model of a standard processor architecture on which the aforementioned modifications will be made and which basically affect the design of the pipeline. Said architectures description models, will allow to generate the manufacturing details of the device, which can be materialized on a programmable device such as an FPGA ( Programmable Gate Array , Field Programmable Gate Array) or on an Integrated Application Specific Circuit ( ASIC, Application Specific Integrated Circuit).
[0143]
[0144] There are different realization options. All of them start from the VHDL model of an "IP Core" of a segmented RISC processor, such as ARM or LEON, on which the implementation of the pipeline structure of the device will be modified to include the functionality described in this patent. The objective is to generate a new "IP Core", which can be manufactured on FPGA or ASIC.
权利要求:
Claims (11)
[1]
1. A device for parallel processing of program instructions and trace instructions, characterized in that it comprises:
• an instruction search stage (100), which in turn comprises:
or a module for calculating the address of the instruction (101); and, or module of search of the instructions with double reading port (102);
• a duplicate decoding stage (103);
• a trace pipeline (113) for the processing of trace instructions only;
• an exit record (123) for the trace;
• a data path which in turn comprises a set of multiplexers (109, 110, 116, 119);
• a controller of the data path (104), which in turn comprises inputs (105) and outputs (106) that control the multiplexers (109, 110, 116, 119), the load in associated registers the different ones stages (107, 108, 114, 115, 117, 118, 120, 121) and the output register (123) for the trace;
wherein the data path controller (104) is configured to determine, based on the state of said controller (104) and the value of the inputs (105) in said controller, the value of the outputs (106) that are sent to the multiplexers (109, 110, 116 and 119) of the data path in such a way that a trace instruction is executed in synchronization with the preceding instruction, said execution becoming effective during the last stage (121) of the pipeline of trace (113).
[2]
2. A device for parallel processing of program instructions and trace instructions, according to claim 1, characterized in that the controller comprises the following sequences of instructions:
• S1: corresponds to instruction-trace pairs (301-302, 303-304 and 305-306) in which the instructions to be drawn (301, 303 and 305) are always loaded in the "INSTRUCTION N" element (107) , while the corresponding trace instructions (302, 304 and 306) are loaded into the "INSTRUCTION N + 1" (115) element;
• S2: corresponds to a sequence of instructions (401, 402, 403 and 404) that does not they are drawn, so in the elements "INSTRUCTION N" (114) and "INSTRUCTION N + 1" (115) instructions are always loaded;
• S3: corresponds to two trace-instruction pairs (502-503 and 504-505) in which the trace instructions (503 and 505) are loaded in successive cycles (500 and 510) in the "INSTRUCTION N 1" element ( 107), while the instructions to be plotted (502 and 504) are loaded in those same cycles in the "INSTRUCTION N + 1" (108) element.
[3]
3. A device for parallel processing of program instructions and trace instructions, according to claim 2, characterized in that the processing device executes the sequence S1 during two clock cycles, T (300) and T + 1 (307); so that the instructions stored in directions X + 1 (302), X + 3 (304) and X + 5 (306), are the trace instructions of the instructions that precede them, located, respectively, in some directions X (301), X + 2 (303) and X + 4 (305).
[4]
4. A device for parallel processing of program instructions and trace instructions, according to claim 2, characterized in that the processing device executes the sequence S2 in which no trace instructions are loaded during two cycles; in such a way that during the first cycle T (400) it is detected that the two instructions that are loaded in the decoding step (403 and 404) are not trace, and therefore the signals "N_ES_TRAZA" (129) and "N_1_ES_TRAZA""(130)both" 0 "are valid, and, in cycle T 1 (408) the controller is in the state called" PENDING INSTR "(202), in which the pending instruction (404) located in the second unit of decoding (108) is directed to "stage 3" of the processor instruction pipeline (114).
[5]
5. A device for parallel processing of program instructions and trace instructions, according to claim 2, characterized in that the processing device executes the sequence S3: in the cycle T (500) the value of the signal "N_ES_TRAZA" (129) ) is "1", while "N_1_ES_TRAZA" (130) is "0", and the value of the multiplexing signal "SEL_TR_P4" (133) is "2", in such a way that a path is enabled where the instruction of trace (503) is synchronized with the execution of the instruction to trace (502); in the cycle T + 1 (510), the trace instruction (503) is located in step 4 of the trace pipeline (118). In the cycle T (500), in addition, the multiplexing signal "SEL_PIPE_TRAZA" (132) takes the value 0, in order that in the cycle T + 1 (510) a zero (507) is found in stage 3 of the trace pipeline (115).
[6]
6. A device for parallel processing of program instructions and trace instructions, according to claim 1, characterized in that the processing device during a "T" cycle detects a bubble in stage 3 of the instruction pipeline (114), in such a way that the controller sets the route that loads a "0" in steps 3 (115) and 4 (118) of the pipeline- trace (113) and a jump direction "Z" (600) is routed to the register input of the search stage (139), in such a way that the detection of the bubble in stage 3 corresponds to the setting "1" of the signal "BURBUJA_P3" (125) and of the signal "BURBUJA" ( 122).
[7]
7. A device for parallel processing of program instructions and trace instructions, according to claim 6, characterized in that the controller controls: the route to stage 3 (115) by assigning "0" to the signal "SEL_PIPE_TRAZA" (132) the route to step 4 (118) by assigning a "0" to the signal "SEL_TR_P4" (133); and loading the input register of the search stage (139) by activating the signal "LD_DIR" (137) and routing the address "Z" (600) to said register (139) by assigning a "0" to the signal "SEL_DIR "(138).
[8]
8. A parallel processing device for program instructions and trace instructions, according to claim 1, characterized in that the processing device during a "T" cycle detects a bubble in step 4 of the instruction pipeline (117), in such a way the controller sets the path that loads a "0" in steps 3 (115), 4 (118) and 5 (121) of the pipeline-trace (113) and the jump direction "Z" (600) is routes to the entry register of the search stage (139), so that the detection of the bubble in stage 4 corresponds to setting "1" of the signal "BURBUJA_P4" (126) and the signal "BUBBLE" (122).
[9]
9. A parallel processing device for program instructions and trace instructions, according to claim 8, characterized in that the controller controls: the route to stage 3 of the pipeline-trace (115) by assigning "0" to the signal "SEL_PIPE_TRAZA"(132); the route to stage 4 (118) ) by assigning a "0" to the signal "SEL_TR_P4" (133), the route to step 5 (121) is controlled by assigning a "0" to the signal "SEL_TR_P5"(134); of the search stage (139) by activating the signal "LD_DIR" (137) and by routing the address "Z" (600) to said register (139) by assigning a 0 to the signal "SEL_DIR" (138).
[10]
10. A RISC processor, "Computer with Reduced Instruction Set", characterized in that it comprises a parallel processing device of program instructions and trace instructions according to any one of the preceding claims.
[11]
A method of parallel processing of program instructions and trace instructions which, executed on a parallel processing device of program instructions and trace instructions defined in any of claims 1 to 9, processes an instruction in parallel and a trace instruction.
类似技术:
公开号 | 公开日 | 专利标题
US9483273B2|2016-11-01|Dependent instruction suppression in a load-operation instruction
US9606806B2|2017-03-28|Dependence-based replay suppression
KR101221512B1|2013-01-15|System and method of data forwarding within an execution unit
US7043416B1|2006-05-09|System and method for state restoration in a diagnostic module for a high-speed microprocessor
US20210311737A1|2021-10-07|Store-to-load forwarding
KR20110008298A|2011-01-26|Selectively performing a single cycle write operation with ecc in a data processing system
US20140380024A1|2014-12-25|Dependent instruction suppression
BR102013015049B1|2021-03-02|apparatus and method
US9003225B2|2015-04-07|Confirming store-to-load forwards
US10795685B2|2020-10-06|Operating a pipeline flattener in order to track instructions for complex
US8484520B2|2013-07-09|Processor capable of determining ECC errors
ES2697548A1|2019-01-24|A METHOD AND A PROCESSING DEVICE IN PARALLEL OF PROGRAM INSTRUCTIONS AND TRAIL INSTRUCTIONS |
KR20190033084A|2019-03-28|Store and load trace by bypassing load store units
US6934828B2|2005-08-23|Decoupling floating point linear address
US20140143521A1|2014-05-22|Instruction swap for patching problematic instructions in a microprocessor
ES2802723B2|2021-07-27|A METHOD FOR SELECTIVE TRACING OF INSTRUCTION EXECUTION, RELATED PROCESSING DEVICE AND PROCESSOR
US20140351556A1|2014-11-27|Methods for operating and configuring a reconfigurable processor
US9361104B2|2016-06-07|Systems and methods for determining instruction execution error by comparing an operand of a reference instruction to a result of a subsequent cross-check instruction
US9582286B2|2017-02-28|Register file management for operations using a single physical register for both source and result
US9411582B2|2016-08-09|Apparatus and method for processing invalid operation in prologue or epilogue of loop
US10719325B2|2020-07-21|System and method of VLIW instruction processing using reduced-width VLIW processor
KR20200088760A|2020-07-23|Checksum generation
US20110246747A1|2011-10-06|Reconfigurable circuit using valid signals and method of operating reconfigurable circuit
JP5467172B1|2014-04-09|Information processing system and information processing method
BR112015022683B1|2021-12-21|PROCESSING SYSTEM AND METHOD OF CARRYING OUT A DATA HANDLING OPERATION
同族专利:
公开号 | 公开日
WO2019180288A1|2019-09-26|
ES2697548B2|2020-07-22|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
US6499123B1|1989-02-24|2002-12-24|Advanced Micro Devices, Inc.|Method and apparatus for debugging an integrated circuit|
US5564028A|1994-01-11|1996-10-08|Texas Instruments Incorporated|Pipelined data processing including instruction trace|
US5996092A|1996-12-05|1999-11-30|International Business Machines Corporation|System and method for tracing program execution within a processor before and after a triggering event|
US5933626A|1997-06-12|1999-08-03|Advanced Micro Devices, Inc.|Apparatus and method for tracing microprocessor instructions|
GB2492457A|2011-06-29|2013-01-02|Ibm|Predicting out of order instruction level parallelism of threads in a multi-threaded processor|
US20130290640A1|2012-04-27|2013-10-31|Nvidia Corporation|Branch prediction power reduction|ES2802723A1|2019-07-12|2021-01-20|Univ Alcala Henares|A METHOD FOR SELECTIVE TRACING OF INSTRUCTION EXECUTION, RELATED PROCESSING DEVICE AND PROCESSOR |
法律状态:
2019-01-24| BA2A| Patent application published|Ref document number: 2697548 Country of ref document: ES Kind code of ref document: A1 Effective date: 20190124 |
2020-07-22| FG2A| Definitive protection|Ref document number: 2697548 Country of ref document: ES Kind code of ref document: B2 Effective date: 20200722 |
优先权:
申请号 | 申请日 | 专利标题
ES201830266A|ES2697548B2|2018-03-20|2018-03-20|A PARALLEL PROCESSING METHOD AND DEVICE FOR PROGRAM INSTRUCTIONS AND TRACE INSTRUCTIONS|ES201830266A| ES2697548B2|2018-03-20|2018-03-20|A PARALLEL PROCESSING METHOD AND DEVICE FOR PROGRAM INSTRUCTIONS AND TRACE INSTRUCTIONS|
PCT/ES2019/070176| WO2019180288A1|2018-03-20|2019-03-18|Method and device for parallel processing of program instructions and trace instructions|
[返回顶部]